In this tutorial, we will learn how to work with lidar derived raster product that represent topography and snow depth. The objectives include:
The prerequisite for this tutorial is basic experience with python and geospatial data.
We will start by exploring point cloud data and take a quick look at how rasters are created from them. Then we will derive vegetation and snow depth by differencing elevation models followed by analysing these lidar data including DEM. Finally, we will address common issue with working with lidar data from varying sources and talk about available snow focused lidar data.
Lidar stands for light detection and Ranging also called 3D laser scanning. It is an active remote sensing techniques for measuring elevation across wide areas.
Lidar provides highly accuracte elevation data at high resolution
There are different types depending on: platform used, physical process or scattering process. Based on the platform used, we find 3 types of LiDAR:
For this tutorial, we will focus on Airborne LiDAR. Airborne LiDAR technology is used to measure topography using a laser beam directed towards the ground with GPS and IMU systems providing the location and orientation of the airborne platform.
Lidar records two types of data- Range distance and Intensity. Range is a function of two-way travel time to ground and back. Intensity is the fraction of photons returned controlled by material properties.
Each lidar pulse yields multiple returns. The number and complexity of returns provide information about the surface material.
Depending on the feature the point return is coming from, elevation models can be Digital Terrain Model (DTM) or Digital Surface Model (DSM). DTM is the three-dimensional representation of the terrain without natural or man-made objects. DSM is a three-dimensional representation of the heights of the Earth's surface, including natural or man-made objects located on it.
In this activity, we will open a .las file, in the plas.io free online lidar data viewer and explore some attributes associated with a lidar data point cloud.
Load the data: under file, select Autzen stadium. To navigate around, use left click and drag to tilt up and down, righ click and drag to move around. scroll bar to zoom in and out
Adjust Intensity
Change the lidar point cloud color options to classification
Explore
Points clouds give us a lot of information (x, y, z and intensity). However, these data might be too large, overwhelming and difficult for scientific application. Lidar data are often provided in raster data format which is easier to work with in many GIS tools. Raster data can be generated from point clouds using PDAL. This is done in two approaches: Filtering and Rasterizing. The filering pipeline involves various steps such as reprojection, reclassification, noise removal, outlier removal, segmentation and extraction. The filtered point cloud can then be rasterized using preferred algorithm. See this post for detailed process
Canopy Height Model is a popular raster product from lidar data. It can be derived from differencing DSM and DEM. Snow depth is another raster product of interest. This is derived by differencing snow-on DEM and snow-off DEM.
We will difference the appropriate elevation model to derive vegetation height and snow depth. To begin, let's load the necessary packages for this tutorial.
#import packages
import os
import numpy as np
#plotting packages
import matplotlib.pyplot as plt
import seaborn as sns
import hvplot.xarray
#geospatial packages
import geopandas as gpd #for vector data
import xarray as xr
import rioxarray #for raster data
import rasterstats as rs
The original data is permanently stored with Zenodo: https://zenodo.org/record/6789541
For the Hackweek we are providing a faster download alternative with the 'tutorial-data' repository. This link might not be available in the future though. Once that is the case, replace the below github.com link with this: https://zenodo.org/record/6789541/files/input.zip
%%bash
# Retrieve a copy of data files used in this tutorial from Zenodo.org:
# Re-running this cell will not re-download things if they already exist
cd /tmp
wget -q -nc -O input.zip "https://github.com/snowex-hackweek/tutorial-data/releases/download/SnowEx_2022_lidar/input.zip"
unzip -q -n input.zip
#read the snow_on and snow_free DEM
cam_dem_snowOn = rioxarray.open_rasterio('/tmp/input/Cameron_Winter_DEM.tif', masked = True)
cam_dem_snowFree = rioxarray.open_rasterio('/tmp/input/Cameron_Summer_DEM.tif', masked = True)
#derive snow depth
cam_snow_depth = cam_dem_snowOn - cam_dem_snowFree
#read the snow_free DSM to calculate vegetation height
cam_dsm_snowFree = rioxarray.open_rasterio('/tmp/input/Cameron_Summer_DSMs.tif', masked = True)
#derive vegetation height
cam_vh = cam_dsm_snowFree - cam_dem_snowFree
What does the DataArray object looks like?
cam_snow_depth
<xarray.DataArray (band: 1, y: 15118, x: 9600)>
array([[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)
Coordinates:
* y (y) float64 4.492e+06 4.492e+06 ... 4.484e+06 4.484e+06
* x (x) float64 4.223e+05 4.223e+05 ... 4.271e+05 4.271e+05
* band (band) int64 1
spatial_ref int64 0The data has about 145 million values. Let's check the shape, CRS,extent and resolution and make a quick interactive plot.
print('The shape of the data is is:', cam_snow_depth.shape)
print('\nCRS of the data is:', cam_snow_depth.rio.crs)
print('\n The spatial extent of the data is:', cam_snow_depth.rio.bounds())
print('\n The resolution of the data is:', cam_snow_depth.rio.resolution())
The shape of the data is is: (1, 15118, 9600) CRS of the data is: PROJCS["unnamed",GEOGCS["Unknown datum based upon the GRS 1980 ellipsoid",DATUM["Not_specified_based_on_GRS_1980_ellipsoid",SPHEROID["GRS 1980",6378137,298.257222101004,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6019"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4019"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-105],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]] The spatial extent of the data is: (422266.5, 4484116.0, 427066.5, 4491675.0) The resolution of the data is: (0.5, -0.5)
Let's do a quick plot of the snow depth product
cam_snow_depth.hvplot.image(x= 'x', y = 'y', rasterize = True, cmap = 'viridis', height = 500)
Let's subset over small area for faster plotting and further analysis.
cam_dem_roi = cam_dem_snowFree.sel(y = slice(4489168, 4487655), x = slice(425077, 425848))
cam_vh_roi = cam_vh.sel(y = slice(4489168, 4487655), x = slice(425077, 425848))
cam_sd_roi = cam_snow_depth.sel(y = slice(4489168, 4487655), x = slice(425077, 425848))
# Set font size and font family
plt.rcParams.update({'font.size': 18})
plt.rcParams['font.family'] = 'serif'
plt.rcParams['font.serif'] = ['Times New Roman'] + plt.rcParams['font.serif']
#create fig and axes elements
fig, axs = plt.subplots(ncols = 3, figsize=(18, 12))
#plot the data
cam_dem_roi.plot(ax = axs[0], cmap = 'Greys', robust = True, cbar_kwargs={'label': 'Elevation (m)', 'orientation':'horizontal'})
cam_vh_roi.plot(ax = axs[1], cmap = 'summer_r', robust = True, cbar_kwargs={'label': 'Vegetation Height (m)', 'orientation':'horizontal'})
cam_sd_roi.plot(ax = axs[2], cmap = 'Blues', robust = True, cbar_kwargs={'label': 'Snow Depth (m)', 'orientation':'horizontal'})
#Set the title
# fig.suptitle('Cameron Study Area', fontsize = 20)
#set the axes title and tick locators
axs[0].set_title('')
axs[0].xaxis.set_major_locator(plt.LinearLocator(3))
axs[1].set_title('')
axs[1].xaxis.set_major_locator(plt.LinearLocator(3))
axs[2].set_title('')
axs[2].xaxis.set_major_locator(plt.LinearLocator(3))
plt.tight_layout()
plt.show()
Lidar provides highly accurate DEM, vegetation height and snow depth. Our goal here is to quickly demonstrate some of the analysis we can do with these products. We will sample snow depth, elevation and vegetation height at 100 locations to quantify snow distribution over the the subset region
#import point location data
cam_100_points = gpd.read_file('/tmp/input/cameron_100_points.gpkg')
#create a dir to store the output
if not os.path.exists("./output"):
os.makedirs("./output")
#create a buffer around the point feature and save to file
cam_100_points.buffer(20).to_file('./output/cam_100_points_buffer.gpkg')
Let's see the sampled points
cam_100_poly = gpd.read_file('./output/cam_100_points_buffer.gpkg')
fig, ax = plt.subplots(figsize=(10,12))
cam_sd_roi.plot(ax = ax)
cam_100_poly.plot(ax = ax, color = 'black')
plt.show()
#Extract raster values for the vegetation height
cam_VH_stat = rs.zonal_stats('./output/cam_100_points_buffer.gpkg',
cam_vh_roi.squeeze().values,
nodata = -9999,
geojson_out=True,
affine = cam_vh_roi.rio.transform(),
stats = 'count mean')
cam_VH_stat_df = gpd.GeoDataFrame.from_features(cam_VH_stat) #turn the geojson to a dataframe
cam_VH_stat_df.rename(columns = {'mean': 'VH_mean', 'count': 'VH_count'}, inplace = True) #rename the columns
#Extract raster values for the snow depth
cam_SD_stat = rs.zonal_stats(cam_VH_stat_df,
cam_sd_roi.squeeze().values,
nodata = -9999,
geojson_out=True,
affine = cam_sd_roi.rio.transform(),
stats = 'count mean')
cam_SD_stat_df = gpd.GeoDataFrame.from_features(cam_SD_stat) #turn the geojson to a dataframe
cam_SD_stat_df.rename(columns = {'mean': 'SD_mean', 'count': 'SD_count'}, inplace = True) #rename the columns
#Extract raster values for elevation
cam_DEM_stat = rs.zonal_stats(cam_SD_stat_df,
cam_dem_roi.squeeze().values,
nodata = -9999,
geojson_out=True,
affine = cam_dem_roi.rio.transform(),
stats = 'count mean')
cam_DEM_stat_df = gpd.GeoDataFrame.from_features(cam_DEM_stat) #turn the geojson to a dataframe
cam_DEM_stat_df.rename(columns = {'mean': 'ELEV_mean', 'count': 'ELEV_count'}, inplace = True) #rename the columns
#rename the dataframe
vh_sd_dem = cam_DEM_stat_df
# Set font size and font family
plt.rcParams.update({'font.size': 18})
plt.rcParams['font.family'] = 'serif'
plt.rcParams['font.serif'] = ['Times New Roman'] + plt.rcParams['font.serif']
#create figure and gridspec object
fig = plt.figure(figsize=(20,12), constrained_layout=True)
gspec = fig.add_gridspec(ncols=5, nrows=5)
ax0 = fig.add_subplot(gspec[0, 0])
ax1 = fig.add_subplot(gspec[0, 1])
ax2 = fig.add_subplot(gspec[0, 2])
ax3 = fig.add_subplot(gspec[1:5, 0])
ax4 = fig.add_subplot(gspec[1:5, 1])
ax5 = fig.add_subplot(gspec[1:5, 2])
ax6 = fig.add_subplot(gspec[0:5, 3])
ax7 = fig.add_subplot(gspec[0:5, 4])
# plot the boxplot of elevation, vegetation and snow thickness
sns.boxplot(x= vh_sd_dem['ELEV_mean'], ax=ax0, palette= 'colorblind')
ax0.set_xlabel('')
sns.boxplot(x= vh_sd_dem['VH_mean'], ax=ax1, palette= 'colorblind')
ax1.set_xlabel('')
sns.boxplot(x= vh_sd_dem['SD_mean'], ax=ax2, palette= 'colorblind')
ax2.set_xlabel('')
# plot the histogram and boxplot of elevation, vegetation and snow thickness
sns.histplot(data = vh_sd_dem, x ='ELEV_mean', ax=ax3, bins=20)
ax3.set_xlabel('Elevation (m)')
sns.histplot(data = vh_sd_dem, x ='VH_mean', ax=ax4, bins=20, binrange=(1,10))
ax4.set_xlabel('Vegetation Height (m)')
sns.histplot(data = vh_sd_dem, x ='SD_mean', ax=ax5, bins=30, binrange=(0,1.5))
ax5.set_xlabel('Snow Depth (m)')
#make the scatter plot of elevation vs snow thickness and vegetation vs snow thickness
sns.kdeplot(data = vh_sd_dem, x = 'SD_mean', y = 'ELEV_mean', shade=True, ax=ax6, cmap='Blues')
ax6.set_xlabel('Snow Depth (m)')
ax6.set_ylabel('Elevation (m)')
#set xlimt and y limt
#ax6.set_xlim(0, 2.5)
ax6.set_ylim(2900, 3300)
sns.kdeplot(data = vh_sd_dem, x = 'SD_mean', y = 'VH_mean', shade=True, ax=ax7, cmap='Blues')
ax7.set_xlabel('Snow Depth (m)')
ax7.set_ylabel('Vegetation Height (m)')
#set xlimt and y limt
#ax7.set_xlim(0, 2.5)
ax7.set_ylim(-3, 15)
plt.show()
The lidar data we have used so far are acquired from the same source (QSI). This made the DEM differencing to be quiet straightforward because the crs, resolution and extent are the same. At times, the elevation data might be from diffeerence sources. It will thus require extra effort of resampling and reprojection to accurately derive snow depth. Let's do DEM differencing again but using snow-on and snow-off DEM from different sources. The snow-on data is an 0.5m resolution raster acquired by QSI over GrandMesa in 2020. A 3m DEM data acquired by ASO in 2016 will be used as the snow-off DEM.
As a first step, we have to downsample the QSI data to the resolution of the ASO.
import os
import numpy as np
#plotting packages
import matplotlib.pyplot as plt
import seaborn as sns
import hvplot.xarray
#geospatial packages
import geopandas as gpd #for vector data
import xarray as xr
import rioxarray #for raster data
import rasterstats as rs
%%bash
# Retrieve a copy of data files used in this tutorial from Zenodo.org:
# Re-running this cell will not re-download things if they already exist
cd /tmp
wget -q -nc -O input.zip "https://github.com/snowex-hackweek/tutorial-data/releases/download/SnowEx_2022_lidar/input.zip"
unzip -q -n input.zip
!gdal_translate -tr 3, 3 /tmp/input/QSI_0.5m_GrandMesa13Feb_Winter_DEM.tif ./output/QSI_3m_GrandMesa13Feb_Winter_DEM.tif
Input file size is 45728, 21442 0...10...20...30...40...50...60...70...80...90...100 - done.
#check the raster metadata
!gdalinfo ./output/QSI_3m_GrandMesa13Feb_Winter_DEM.tif
Driver: GTiff/GeoTIFF
Files: ./output/QSI_3m_GrandMesa13Feb_Winter_DEM.tif
Size is 7621, 3574
Coordinate System is:
PROJCRS["unnamed",
BASEGEOGCRS["Unknown datum based upon the GRS 1980 ellipsoid",
DATUM["Not specified (based on GRS 1980 ellipsoid)",
ELLIPSOID["GRS 1980",6378137,298.257222101004,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4019]],
CONVERSION["Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",-111,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["easting",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["northing",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
Data axis to CRS axis mapping: 1,2
Origin = (737387.500000000000000,4330284.500000000000000)
Pixel Size = (3.000000000000000,-3.000000000000000)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
INTERLEAVE=BAND
Corner Coordinates:
Upper Left ( 737387.500, 4330284.500) (108d15'19.09"W, 39d 5'21.85"N)
Lower Left ( 737387.500, 4319562.500) (108d15'32.54"W, 38d59'34.42"N)
Upper Right ( 760250.500, 4330284.500) (107d59'28.62"W, 39d 4'58.38"N)
Lower Right ( 760250.500, 4319562.500) (107d59'43.36"W, 38d59'11.03"N)
Center ( 748819.000, 4324923.500) (108d 7'30.87"W, 39d 2'16.69"N)
Band 1 Block=7621x1 Type=Float32, ColorInterp=Gray
NoData Value=-3.4028235e+38
#read the data as arrays
snow_on_dem = rioxarray.open_rasterio('./output/QSI_3m_GrandMesa13Feb_Winter_DEM.tif', masked =True)
snow_on_dem.hvplot.image(x= 'x', y = 'y', rasterize = True, cmap = 'viridis')
#read the snow-off DEMs
snow_free_dem = rioxarray.open_rasterio('/tmp/input/ASO_3M_GrandMesa_Summer_DEM.tif',
masked=True)
snow_free_dem.hvplot.image(x= 'x', y = 'y', rasterize = True, cmap = 'viridis')
The extent and crs of the snow-off DEM is different from the snow-on. Let's match the snow-off DEM with the snow-on DEM
snow_free_dem_match = snow_free_dem.rio.reproject_match(snow_on_dem)
snow_free_dem_match.hvplot.image(x= 'x', y = 'y', rasterize = True, cmap = 'viridis')
snow_depth = snow_on_dem - snow_free_dem_match
snow_depth.hvplot.image(x= 'x', y = 'y', rasterize = True, cmap = 'viridis')
About 17 m snow depth.... What's the issue?
!gdalsrsinfo -o proj4 /tmp/input/ASO_3M_GrandMesa_Summer_DEM.tif
+proj=utm +zone=13 +datum=WGS84 +units=m +no_defs
!gdalsrsinfo -o proj4 ./output/QSI_3m_GrandMesa13Feb_Winter_DEM.tif
The snow-on and snow-free DEM are referenced to different datum, so we need to do vertical datum transformation. Matching the vertical datum of data from different sources is important for elevation data.
The vertical datum of ASO is in ellipsoid (WGS 84). QSI is in UTM Zone 12 North. Horizontal Datum used is NAD83 (2011). NAD 83 is a 3-D reference system, it also serves as the horizontal control datum for the United States, Canada, Greenland and Central America using the Geodetic Reference System 1980 (GRS 80) ellipsoid, an earth-centered reference ellipsoid. See this link for more reading. The vertical datum is NAVD88 (Geoid12b).
Let's transform ASO to same datum as QSI.
ls /tmp/input/egm96_15.gtx
#coordinate transformation
!gdalwarp -s_srs "+proj=utm +zone=13 +datum=WGS84 +units=m +no_defs" -t_srs "+proj=utm +zone=12 +ellps=GRS80 +units=m +no_defs +geoidgrids=/tmp/input/egm96_15.gtx" /tmp/input/ASO_3M_GrandMesa_Summer_DEM.tif ./output/ASO_3M_GrandMesa_Summer_DEM_geoid_96.tif
Processing /tmp/input/ASO_3M_GrandMesa_Summer_DEM.tif [1/1] : 0Using internal nodata values (e.g. -9999) for image /tmp/input/ASO_3M_GrandMesa_Summer_DEM.tif. ...10...20...30...40...50...60...70...80...90...100 - done.
#read the snow-off DEMs
snow_free_dem = rioxarray.open_rasterio('./output/ASO_3M_GrandMesa_Summer_DEM_geoid_96.tif',
masked=True)
#reproject the ASO DEM to match the QSI DEM
snow_free_dem_match = snow_free_dem.rio.reproject_match(snow_on_dem)
snow_depth = snow_on_dem - snow_free_dem_match
snow_depth = snow_depth.where((snow_depth > 0) & (snow_depth < 3))
snow_depth.hvplot.image(x= 'x', y = 'y', rasterize = True, cmap = 'viridis')
GDAL uses the PROJ library to carry out CRS transformations. PROJ uses pre-defined grids for datum transformations. Vertical datum transformations are defined using a GTX file. In the example above, when we did the transformation to EGM96 vertical CRS, it used the parameters supplied in the egm96_15.gtx file. Many such grid files are included when you download and install GDAL. But there are other grids which are very large and are not included by default. If you want to convert to a vertical datum whose grid files are not included by default, you need to download them separately and copy them to appropriate directory on your system. Learn more about PROJ Resource files.
One such vertical CRS is EGM2008 (EPSG:3855). Say you wanted to transform WGS84 heights to this new and more precise geoid model. You can run a command such as below
none
gdalwarp -s_srs "+proj=longlat +datum=WGS84 +no_def" -t_srs "+proj=longlat +datum=WGS84 +no_defs +geoidgrids=egm08_25.gtx" /tmp/input/ASO_3M_GrandMesa_Summer_DEM.tif ./output/ASO_3M_GrandMesa_Summer_DEM_geoid_08.tif
But if you run it, you may get an error such as below
none
ERROR 1: Cannot open egm08_25.gtx.
This is because the EGM2008 grid is very large and is not included in many GDAL installations. To fix this, you can download the grid file from http://download.osgeo.org/proj/vdatum/
Copy the egm08_25.gtx file to the PROJ resource directory on your computer. The location of this directory will depend on your platform and the installation method. Some common paths are as below
none
Windows: C:\OsGeo4W64\share\proj\
Mac: /Library/Frameworks/PROJ.framework/Resources/proj/
Linux: /usr/share/proj/
Another potential source of error for lidar differencing is coregristration. If the differenced value still doesn't make sense after matching the crs and extent, it might be a misregistration issue.
Various lidar data has been acquired to monitory snow over the western US.
2021 Quantumn data over selected areas. The lidar data for 2020 and 2021 covers the location shown in the map below and include:

References: